The Bay Area is notoriously known for its high cost of living, beginning with the absurd amount of money we pay for rent. As a result, people have sprawled to surrounding cities like Gilroy to avoid these high costs and take longer times commuting. Zillow and here examined 34 cities across the nation to find the real-estate cost of commuting. This study examined different models in U.S. cities. They used Zillow to determine how much money you could save on rent for living 15 minutes further away. For places like Boston, they saw that this 15 minute move equated to an average of 13% less expensive housing. Another study used an empirical model to see the trade-off that individuals have of wages, housing prices, and commuter costs. This balance between time and money spent and saved shows population shifts towards rural areas.
To start with this project, I mapped the commute times for those in the Bay Area and counties surrounding them. This included the original 9 counties in the Bay along with Santa Cruz, San Benito,San Joaquin,Stanislaus, Monterey, Sacramento, Yolo, Merced, Fresno, and San Luis Obispo. Here are the leaflet maps of each year’s (2016-2019) average commute times for each PUMA region..
For all four years, there is consistently a lower average commute time in the Sunnyvale/Cupertino/Mountain View regions. This is possibly due to the fact that people who live there have the opportunity to work close by. We can think of a lot of large tech companies that are based there too!
## [1] "The average difference in commute times for all the counties from 2016 to 2019 was an increase in 1.63 minutes"
## [1] "The average difference in commute times for all the counties from 2017 to 2019 was an increase in 1.12 minutes"
## [1] "The average difference in commute times for all the counties from 2018 to 2019 was an increase in 0.55 minutes"
## [1] "The largest increase in average commute time from 2016 to 2019 was in PUMA code 01305 with a 6.28 increase in minutes which is in Contra Costa County (South)--San Ramon City & Danville Town."
## [1] "The largest decrease in average commute time from 2016 to 2019 was in PUMA code 07902 with a -3.99 decrease in minutes which is in San Luis Obispo County (East)--Inland Region."
You can see the differences in overall average commute times over the years in the printed lines above. In addition, to see if there were differences in commute times over the years among all the counties, I plotted density plots for each year as shown below.We can see that the commute times for most people are under 50 minutes but can be as long as 142 minutes to get to work. Looking at these rows with 142 minutes of commute time, many of them come from San Joaquin County and less so from Bay Area counties. From the histogram of transportation methods to work, we can see that most people get to work via car, truck, or van.
I wanted to see if there was a way to predict someone’s access to a smartphone based on their commute time, rent, income, and access to the internet. This was done using a quasibinomial logit model. The results are shown below.Just based on the Estimate column in the summary of the logit model, we can see that at face value, not having a smartphone, holding all other variables constant, increases your chances of not having internet access or having free internet access and increases your chances of not having access to hot and cold running water. We can see this from the ACCESS estimate value of 1.559 and RWAT value of 1.703. The other factors such as commute time, rent amount, and income have very small values and are close to 0. This shows that the probabilities are more or less 50%. This is shown in the equation that defines the odds of probabilities occurring. The values for JWMNP, RNTP, PINCP are very close to 0.5 while ACCESS and RWAT are around 0.8.
## (Intercept) JWMNP RNTP PINCP ACCESS RWAT
## 0.01223195 0.49563133 0.49993144 0.49999996 0.82625052 0.84595888
##
## Call:
## glm(formula = SMARTPHONE ~ JWMNP + RNTP + PINCP + ACCESS + RWAT,
## family = quasibinomial(), data = model_bay_pums_com_2019)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.9669 -0.3378 -0.2851 -0.2289 3.5392
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -4.391e+00 2.974e-02 -147.681 <2e-16 ***
## JWMNP -1.748e-02 8.944e-04 -19.538 <2e-16 ***
## RNTP -2.742e-04 1.683e-05 -16.294 <2e-16 ***
## PINCP -1.581e-07 1.784e-07 -0.887 0.375
## ACCESS 1.559e+00 1.657e-02 94.126 <2e-16 ***
## RWAT 1.703e+00 1.648e-01 10.334 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for quasibinomial family taken to be 1.045504)
##
## Null deviance: 55918 on 121303 degrees of freedom
## Residual deviance: 46191 on 121298 degrees of freedom
## AIC: NA
##
## Number of Fisher Scoring iterations: 6
A similar magnitude of estimate values and probabilities occurs when I switch the model to see whether we can predict someone’s access to running water based on these other factors. The same results occur where rent, commute times, and income are very close to 0 while ACCESS and SMARTPHONE have more positive correlations.
## (Intercept) JWMNP RNTP PINCP ACCESS SMARTPHONE
## 0.0007714457 0.4997548090 0.5000618898 0.4999995405 0.6285166893 0.8444148012
##
## Call:
## glm(formula = RWAT ~ JWMNP + RNTP + PINCP + ACCESS + SMARTPHONE,
## family = quasibinomial(), data = model_bay_pums_com_2019)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.3191 -0.0605 -0.0511 -0.0486 3.8039
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -7.166e+00 1.480e-01 -48.426 < 2e-16 ***
## JWMNP -9.808e-04 3.151e-03 -0.311 0.7556
## RNTP 2.476e-04 5.745e-05 4.309 1.64e-05 ***
## PINCP -1.838e-06 1.054e-06 -1.744 0.0812 .
## ACCESS 5.259e-01 8.961e-02 5.868 4.41e-09 ***
## SMARTPHONE 1.691e+00 1.630e-01 10.377 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for quasibinomial family taken to be 0.9988955)
##
## Null deviance: 3764.1 on 121303 degrees of freedom
## Residual deviance: 3512.9 on 121298 degrees of freedom
## AIC: NA
##
## Number of Fisher Scoring iterations: 9
To see if there was any linear probability of predicting commute times with these aspects, I divided the commute times by 142 such that the values would be between 0 and 1 where an increase in an estimated value would mean an increase in commute times. The estimates and probabilities are shown below. We can see that for most of these values, they cannot really show a definitive increase or decrease in commute times. The most outstanding one would be that an increase in commute time very slightly decreases your chance of owning a smartphone.
## (Intercept) SMARTPHONE RNTP PINCP ACCESS RWAT
## 0.08840686 0.35050730 0.50002479 0.50000100 0.46991177 0.48421864
##
## Call:
## glm(formula = JWMNP/142 ~ SMARTPHONE + RNTP + PINCP + ACCESS +
## RWAT, family = quasibinomial(), data = model_bay_pums_com_2019)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.1341 -0.4177 -0.3797 0.1405 2.5784
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.333e+00 1.840e-02 -126.835 <2e-16 ***
## SMARTPHONE -6.168e-01 3.000e-02 -20.561 <2e-16 ***
## RNTP 9.916e-05 4.713e-06 21.039 <2e-16 ***
## PINCP 4.005e-06 4.025e-08 99.521 <2e-16 ***
## ACCESS -1.205e-01 1.619e-02 -7.442 1e-13 ***
## RWAT -6.315e-02 1.232e-01 -0.513 0.608
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for quasibinomial family taken to be 0.278243)
##
## Null deviance: 31368 on 121303 degrees of freedom
## Residual deviance: 28442 on 121298 degrees of freedom
## AIC: NA
##
## Number of Fisher Scoring iterations: 5
To narrow it down to two factors, I created a linear model to compare income and commute times. The regression coefficient is extremely low (0.098), demonstrating lack of linear correlation between these two factors.
##
## Call:
## lm(formula = JWMNP ~ PINCP, data = model_bay_pums_com_2019)
##
## Residuals:
## Min 1Q Median 3Q Max
## -103.302 -11.175 -8.901 6.025 131.889
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.052e+01 7.354e-02 143.1 <2e-16 ***
## PINCP 8.122e-05 7.043e-07 115.3 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 22.45 on 121302 degrees of freedom
## Multiple R-squared: 0.0988, Adjusted R-squared: 0.09879
## F-statistic: 1.33e+04 on 1 and 121302 DF, p-value: < 2.2e-16
I also tried to linearly model commute times with how much people pay in rent. This also showed drastically low regression values. As a note, I filtered out rent to be anything greater than $0 as people may have purchased their home.
##
## Call:
## lm(formula = JWMNP ~ RNTP, data = graph_bay_pums_com_2019_rent)
##
## Residuals:
## Min 1Q Median 3Q Max
## -22.061 -14.130 -10.495 7.852 132.813
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.1736959 0.2477946 37.02 <2e-16 ***
## RNTP 0.0033044 0.0001244 26.57 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 23.11 on 42404 degrees of freedom
## Multiple R-squared: 0.01637, Adjusted R-squared: 0.01635
## F-statistic: 705.9 on 1 and 42404 DF, p-value: < 2.2e-16
Alas, linearly comparing income and rent yielded low regression values as well.
##
## Call:
## lm(formula = PINCP ~ RNTP, data = graph_bay_pums_com_2019_rent)
##
## Residuals:
## Min 1Q Median 3Q Max
## -102059 -36923 -9388 19041 870508
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2617.4089 706.3781 -3.705 0.000211 ***
## RNTP 21.7121 0.3545 61.239 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 65870 on 42404 degrees of freedom
## Multiple R-squared: 0.08125, Adjusted R-squared: 0.08123
## F-statistic: 3750 on 1 and 42404 DF, p-value: < 2.2e-16
Lastly, to really visualize this data, I plotted a few factors below.
To conclude, I mapped the amount that people pay in rent each month. With this project, I was hoping to show that an increase in rent would decrease commute times and vice versa. However, from my survey regression, this did not turn out to be the case. I think part of this may be due to the overgeneralization of the regional data. Averaging these commute times and rents loses the spatial quality that PUMS data can provide. I also filtered out rent to be for those that have no purchased their home. Often in more rural areas, people purchase their home as opposed to the more common renting of homes in urban areas.